1 Introduction

1.1 Background

1.1.1 Phytocannabinoids

A phytocannabinoid is a cannabinoid that occurs naturally in plants, such as Cannabis sativa. The word “phyto” means “plant”, and the prefix “phyto-” is used in biochemistry to refer to compounds derived from plants. These compounds are known to occur in several plant species besides cannabis. The distinction with the prefix “phyto-” must be made because the human body also produces cannabinoids, referred to as endocannabinoids due to their endogenous nature.

1.1.2 Cannabidiol (CBD)

Cannabidiol (CBD) is one of the numerous phytocannabinoids found in the Cannabis (Cannabis sativa) plant. However, it is not responsible for the “high” commonly associated with use of cannabis - that responsibility falls on delta-9 tetrohydrocannabinol (THC). CBD is a non-psychoactive compound, and has been shown to have a wide range of potential therapeutic properties, including anti-inflammatory, antioxidant, and neuroprotective effects1,2. It has showed promise as a possible treatment for a variety of conditions, including epilepsy, anxiety, and chronic pain3.

1.1.3 SARS-CoV-2

A highly contagious virus, SARS-CoV-2 is the driving force behind the COVID-19 pandemic. It is a novel coronavirus that was first identified in December 2019 in Wuhan, China, and is primarily spread through respiratory droplets. Those who have been infected with the virus can experience a wide range of symptoms, from mild to severe, with the worst outcome being death. Due to its extreme impact on public health, global economy, and daily life, significant amounts of research went into treatment/prevention methods for the virus from 2020 to 2022.

1.2 Primary Research

1.2.1 Quick Summary of the Paper

Cannabidiol Inhibits SARS-CoV-2 Replication through Induction of the Host ER Stress and Innate Immune Responses discusses the effects of CBD treatment in SARS-CoV-2 infected cells and mice, through its action of preventing replication of the virus. The researchers found that these effects are apparent in lung epithelial cells which is a critical finding, given that SARS-CoV-2 manifests itself with many respiratory issues4.

1.2.2 Data Overview

The RNASeq data from this paper is available on the Gene Expression Omnibus (GEO) under the accession number GSE168797. More information about the data is summarized in Table 1.1.

Table 1.1: Summary of the Cannabidiol as COVID Treatment RNASeq Data.
Attribute Value
Title Cannabidiol Inhibits SARS-CoV-2 Replication through Induction of the Host ER Stress and Innate Immune Responses
Organism Homo sapiens
Paper Nguyen, L. C., Yang, D., Nicolaescu, V., Best, T. J., Gula, H., Saxena, D., Gabbard, J. D., Chen, S. N., Ohtsuki, T., Friesen, J. B., Drayman, N., Mohamed, A., Dann, C., Silva, D., Robinson-Mailman, L., Valdespino, A., Stock, L., Suárez, E., Jones, K. A., Azizi, S. A., … Rosner, M. R. (2022). Cannabidiol inhibits SARS-CoV-2 replication through induction of the host ER stress and innate immune responses. Science advances, 8(8), eabi6110. https://doi.org/10.1126/sciadv.abi6110
Samples 24
Status Public on Jan 20, 2022

2 Data Preparation

The initial dataset from GEO had 57 832 genes, but after cleaning the dataset, we were left with 20 438 entries for subsequent analyses. The results of processing and normalization were visualized using a boxplot (Figure 2.1) and a density plot (Figure 2.2) to see the distribution of the data. These plots were constructed using the ggplot2 R package5.

2.1 Boxplot

Boxplot Showing the Distribution of the Normalized RNASeq Data

Figure 2.1: Boxplot Showing the Distribution of the Normalized RNASeq Data

2.2 Density Plot

Density Plot Showing the Distribution of the Normalized RNASeq Data

Figure 2.2: Density Plot Showing the Distribution of the Normalized RNASeq Data

3 Differential Expression Analysis

We examined the differential expression between the healthy and SARS-CoV-2 infected A549 cells, where both groups were treated with CBD. The analysis was performed using the edgeR package in R6. The results were visualized using a volcano plot (Figure 3.1) with highlights on three genes mentioned in the original publication: ATF6, EIF2AK3, and ERN14.

Volcano Plot of Differential Expression between Healthy vs. Infected A549 Cells Treated with CBD

Figure 3.1: Volcano Plot of Differential Expression between Healthy vs. Infected A549 Cells Treated with CBD

4 Thresholded Enrichment Analysis (Over-representation Analysis)

To examine the biological processes that either have increased or decreased activity from the treatment, we performed an over-representation analysis (ORA) using the gprofiler2 R package7. This method of analysis used a list of the significantly up- and down-regulated genes from the previous DE analysis which passed the adjusted p-value threshold of 0.05. The results of this analysis gave us insight into which pathways were most affected through the CBD treatment.

The ORA was ran three times with differing input lists to compare and contrast the results of providing gprofiler2 with up-regulated terms, down-regulated terms, and both up and downregulated terms. We display the top 10 results from the upregulation (Table 4.1), and downregulation (Table 4.2) analyses.

4.1 Upregulated Pathways

Table 4.1: Top 10 upregulated pathways from the differential expression analysis.
term_id term_name p_value intersection_size term_size
GO:0045596 negative regulation of cell differentiation 0e+00 78 434
GO:0071559 response to transforming growth factor beta 0e+00 48 207
GO:0071560 cellular response to transforming growth factor beta stimulus 0e+00 46 201
GO:0010586 miRNA metabolic process 0e+00 29 92
GO:0007179 transforming growth factor beta receptor signaling pathway 0e+00 42 175
GO:2000628 regulation of miRNA metabolic process 0e+00 26 77
GO:0009896 positive regulation of catabolic process 0e+00 78 474
GO:0017015 regulation of transforming growth factor beta receptor signaling pathway 1e-07 32 122
GO:0043161 proteasome-mediated ubiquitin-dependent protein catabolic process 1e-07 64 363
GO:1903844 regulation of cellular response to transforming growth factor beta stimulus 1e-07 32 124

4.2 Downregulated Pathways

Table 4.2: Top 10 downregulated pathways from the differential expression analysis.
term_id term_name p_value intersection_size term_size
GO:0000819 sister chromatid segregation 0 43 215
GO:0000070 mitotic sister chromatid segregation 0 39 176
GO:0140014 mitotic nuclear division 0 44 250
GO:0098813 nuclear chromosome segregation 0 46 275
GO:1905818 regulation of chromosome separation 0 24 70
REAC:R-HSA-68882 Mitotic Anaphase 0 43 228
REAC:R-HSA-2555396 Mitotic Metaphase and Anaphase 0 43 229
REAC:R-HSA-2467813 Separation of Sister Chromatids 0 39 188
GO:0051304 chromosome separation 0 24 76
GO:0007059 chromosome segregation 0 49 360

5 Non-thresholded Enrichment Analysis (Gene Set Enrichment Analysis)

We have 20236 genes that were neither significantly upregulated or downregulated; that is 99.01% of the total genes.

Given this large proportion of genes we do not gain information from via the ORA, we likely miss out on highlighting several biological processes that may be affected by these genes. To combat this, we will perform a gene set enrichment analysis (GSEA) to identify the pathways that may be further affected by this group of genes8.

Since GSEA operates under the assumption of microarray data, we must apply some transformations (given by Equation (5.1)) to our data to specify the rank of each gene.

\[\begin{equation} \text{Rank} = -\log_{10}(p) \times \text{sign}(LFC) \tag{5.1} \end{equation}\]

Where:

5.1 Loading the Gene Sets from the Bader Lab

GSEA comes with its own set of gene sets for the analysis, but they’re not frequently updated (i.e., the sets are updated on a near-yearly basis) meaning that some information could be lost if its tied to a newer annotation. The .gmt from the Bader Lab contains more pathways and is updated on a monthly basis allowing for more information to be pulled.

For our purposes, we will be using the gene set that does not include inferred electronic annotations (IEAs).

We will proceed with using Human_GOBP_AllPathways_noPFOCR_no_GO_iea_March_01_2024_symbol.gmt as our gene set file.

5.2 Performing the GSEA

We have downloaded the GSEA executable to downloads/GSEA_X.X.X (where X.X.X is the version number) and made reference to the entrypoint for the program in gsea_executable. Before we perform the GSEA, we must store reference to the relevant directories and files the analysis will be using.

With everything set up, we can now perform our GSEA.

5.3 GSEA Results

5.4 Insights

1. What method was used to perform the GSEA?

The GSEA was performed using version 4.3.3 of GSEA for the command line8. This enrichment analysis was performed on the results of the differential expression analysis examining the expression effects of CBD treatment on healthy versus SARS-CoV-2 infected A549 cells. This data was transformed to create a rank file based on each gene’s p-value and log fold change. The initial, raw RNASeq data was download from GEO with the accession number GSE168797.

2. What gene sets did you use for your enrichment analysis?

The gene set used for the enrichment analysis was obtained from the Bader Lab’s curated collection. The gene set file used was Human_GOBP_AllPathways_noPFOCR_no_GO_iea_March_01_2024_symbol.gmt. The .gmt was the latest release (March 01, 2024 at the time of writing) and contained gene sets for biological pathways that did not include inferred electronic annotations (IEAs).

3. Provide a summary of your enrichment results.

Table 5.1: Summary of the GSEA results for the CBD treatment of SARS-CoV-2 infection.
Condition Total Gene Sets Gene Sets (FDR < 25%) Gene Sets (p-value < 1%) Gene Sets (p-value < 5%)
CBD_Infect 3914 1493 677 1195
CBD_Healthy 2106 1395 899 1151

4. Compare the results of this non-thresholded analysis to the thresholded analysis performed in Assignment 2.

Table 5.2: Top 5 Terms from the ORA Results.
Upregulated Downregulated
negative regulation of cell differentiation sister chromatid segregation
response to transforming growth factor beta mitotic sister chromatid segregation
cellular response to transforming growth factor beta stimulus mitotic nuclear division
miRNA metabolic process nuclear chromosome segregation
transforming growth factor beta receptor signaling pathway regulation of chromosome separation
Table 5.3: Top 5 Terms from the GSEA Results.
CBD_Infect CBD_Healthy
HALLMARK_TNFA_SIGNALING_VIA_NFKB AEROBIC RESPIRATION
REGULATION OF FAT CELL DIFFERENTIATION HALLMARK_E2F_TARGETS
WHITE FAT CELL DIFFERENTIATION RESPIRATORY ELECTRON TRANSPORT, ATP SYNTHESIS BY CHEMIOSMOTIC COUPLING, AND HEAT PRODUCTION BY UNCOUPLING PROTEINS.
DDX58 IFIH1-MEDIATED INDUCTION OF INTERFERON-ALPHA BETA THE CITRIC ACID (TCA) CYCLE AND RESPIRATORY ELECTRON TRANSPORT
TNFR1-INDUCED PROAPOPTOTIC SIGNALING CELLULAR RESPIRATION

5. Is comparing a non-thresholded vs. a thresholded analysis straightforward? Why or why not?

It’s not a straightforward comparison because in a GSEA analysis, all of the genes in the initial DE results are factored. On the other hand, the ORA only accounts for the significantly upregulated/downregulated genes. This means we lose out on a lot of information in the pathways that are shown as the output.

For example, a non-significantly expressed gene could be part of a pathway that is significantly affected by the treatment.

Overall, ORA results only factor in genes that are expressed beyond a threshold, whereas GSEA factors in everything in the dataset. This makes it difficult to compare the two analyses directly.

6 Visualization using Cytoscape

We will use Cytoscape to create an enrichment map (EM) of our GSEA results. The enrichment map will allow us to visualize the relationships between the gene sets that were significantly enriched in our analysis.

6.1 Enrichment Map Setup

Prior to creating our EM, we will setup the thresholds (Table 6.1) for our analysis. The thresholds will allow us to filter out gene sets that aren’t as significant as we need for analysis.

Table 6.1: Thresholds used for the Enrichment Map analysis.
PValue QValue Similarity
1 0.05 0.375

The p-value threshold will ensure that only gene sets with a p-value of at most 1.0 will be included in the analysis. In a similar fashion, the q-value threshold is used to limit the gene sets in our analysis to those with \(q-value < 0.05\). The similarity threshold is used to filter out gene sets that are not similar enough to be connected in the enrichment map. The similarity metric used is Combined.

With the thresholds and input data set, we can now proceed with creating our EM.

6.2 Using Cytoscape

Our network analysis will be ran using Cytoscape and apps that are available through it9. As a result, we must first ensure Cytoscape is running and that we can connect to it through its REST API (CyREST)10.

##  Cytoscape API Version: v1
##  Cytoscape Version Info: 3.10.2

The EM is created through the enrichmentmap app available through the Cytoscape App Store. Since it runs through Cytoscape, we must first ensure that Cytoscape is running and that we can connect to it through the REST API (CyREST). Overall, our Cytoscape setup will involve the following steps:

  1. Ensure EnrichmentMap, AutoAnnotate, and any other necessary apps are installed in Cytoscape.
  2. Upload the required files to Cytoscape.
  3. Create the enrichment map.
  4. Export an image of the enrichment map.

6.2.1 Installing Cytoscape Apps

We use the installApp function provided by RCy3 to install EnrichmentMap and AutoAnnotate.

# Wrapper function to install Cytoscape apps using RCy3
#
# Args:
#   cyrest_url: The base URL of the Cytoscape REST API
#   apps: A list of apps to install
#
# Returns:
#   A list of the newly installed apps
install_cytoscape_apps <- function(cyrest_url, apps) {

    # Load a list of the currently installed apps
    installed_apps <- RCy3::getInstalledApps(base.url = cyrest_url)
    new_apps <- c()

    for (app in apps) {
        # Check if the app is already installed
        if (!any(grepl(tolower(app), tolower(installed_apps)))) {
            RCy3::installApp(app, base.url = cyrest_url)
            new_apps <- c(new_apps, app)
        } else {
            message(sprintf("App '%s' is already installed.", app))
        }
    }

    return(new_apps)
}

Table: (#tab:install-cytoscape-apps)Newly installed Cytoscape apps.

6.2.2 Uploading Data to Cytoscape

To perform our analysis, we need to provide the following files to Cytoscape:

  • The gene set file in .gmt format.
  • The enrichment results file.
  • The expression data file.
  • The ranks file.

Since Cytoscape is running on the host operating system, we have to provide it with the input files that were generated in the Docker container. We will use the CyREST API to upload these files to Cytoscape.

# Function to upload data to Cytoscape using the provided cyrest_url
#
# Args:
#   cyrest_url: The base URL of the Cytoscape REST API
#   files: A list of files to upload
#
# Returns:
#   A logical value indicating whether the upload was successful or not
upload_data_to_cytoscape <- function(cyrest_url, files) {
    # Store the uploaded file paths
    uploaded_paths <- c()

    for (file in files) {
        # Build the POST request
        bname <- basename(file)
        r <- httr::POST(
            url = paste0(
                cyrest_url, "/enrichmentmap/textfileupload?fileName=", bname
            ),
            config = list(),
            body = list(file = httr::upload_file(file)),
            encode = "multipart",
            handle = NULL
        )

        # Check if the upload was successful
        uploaded_paths <- c(uploaded_paths, httr::content(r, "parsed")$path)
    }

    return(uploaded_paths)
}

6.3 Create the Enrichment Map

With the Cytoscape Apps installed and the data imported, we can now move on to creating our EMs. We will create two EMs: one with a q-value threshold of 0.05 (not shown) and another with a q-value threshold of 0.01 (Figure 6.1). Both EMs will use a p-value threshold of 1.0.

# Function to create an enrichment map using the provided cyrest_url
#
# Args:
#   cur_model_name: The name of the current model
#   pvalue_gsea_threshold: The p-value threshold for the GSEA analysis
#   qvalue_gsea_threshold: The q-value threshold for the GSEA analysis
#   similarity_threshold: The similarity threshold for the enrichment map
#   similarity_metric: The similarity metric for the enrichment map
#   gsea_ranks_file: The file containing the GSEA ranks
#   gsea_results_filename: The file containing the GSEA results
#   expression_file_fullpath: The full path to the expression data file
#   gmt_gsea_file: The file containing the GSEA gene sets
#   current_base: The base URL of the Cytoscape REST API
#
# Returns:
#   The response from the Cytoscape REST API
create_enrichment_map <- function(
    cur_model_name, pvalue_gsea_threshold, qvalue_gsea_threshold,
    similarity_threshold, similarity_metric, gsea_ranks_file,
    gsea_results_filename, expression_file_fullpath, gmt_gsea_file,
    current_base
) {
    # Construct the network name
    current_network_name <- paste(
        cur_model_name,
        pvalue_gsea_threshold,
        qvalue_gsea_threshold,
        sep = "_"
    )

    # Construct the EM command
    em_command <- paste(
        'enrichmentmap build analysisType="gsea"',
        "gmtFile=", gmt_gsea_file,
        "pvalue=", pvalue_gsea_threshold,
        "qvalue=", qvalue_gsea_threshold,
        "similaritycutoff=", similarity_threshold,
        "coefficients=", similarity_metric,
        "ranksDataset1=", gsea_ranks_file,
        "enrichmentsDataset1=", gsea_results_filename,
        "filterByExpressions=false",
        "expressionDataset1=", expression_file_fullpath,
        "gmtFile=", gmt_gsea_file,
        sep = " "
    )

    # Execute the EM command
    response <- RCy3::commandsGET(em_command, base.url = current_base)

    # Initialize network SUID
    current_network_suid <- 0

    # Check if the command execution failed
    if (grepl(pattern = "Failed", response)) {
        return(paste(response))
    } else {
        current_network_suid <- response
    }

    # Check if the network name is unique
    current_names <- RCy3::getNetworkList(base.url = current_base)

    # If the name already exists, prepend the SUID to the name
    if (current_network_name %in% current_names) {
        current_network_name <- paste(
            current_network_suid,
            current_network_name,
            sep = "_"
        )
    }

    # Rename the network
    response <- RCy3::renameNetwork(
        title = current_network_name,
        network = as.numeric(current_network_suid),
        base.url = current_base
    )

    return(response)
}

# 1. Initial EM (p = 1.0, q = 0.05)
em_response <- create_enrichment_map(
    analysis_name, pvalue_gsea_threshold, qvalue_gsea_threshold,
    similarity_threshold, similarity_metric, rank_file_host,
    enrichment_results_host, expression_data_host, gmt_file_host,
    cyrest_url
)
em_response_suid <- em_response$network

# 2. More Stringent EM (p = 1.0, q = 0.01)
analysis_name_stringent <- paste0(analysis_name, "_stringent")
em_response_stringent <- create_enrichment_map(
    analysis_name, pvalue_gsea_threshold, 0.01,
    similarity_threshold, similarity_metric, rank_file_host,
    enrichment_results_host, expression_data_host, gmt_file_host,
    cyrest_url
)
em_response_stringent_suid <- em_response_stringent$network

Unfortunately using Docker, we cannot directly access the images of the networks we create in Cytoscape. This is because of the fundamental separation of filesystems between Docker and the host system. The issues will differ based on the function of RCy3 used, but the reason behind said issues will be the same.

  • RCy3::exportImage will try to export the image in the current working directory (/home/rstudio/projects/Assignment_3), which is not accessible from the host system (and therefore Cytoscape).
  • RCy3::commandsPOST will try to save the image in the Cytoscape session directory, which is not accessible from this Docker container.
    Enrichment Map of the GSEA results.

    Figure 6.1: Enrichment Map of the GSEA results.

6.4 Initial Analysis

After creating two basic enrichment maps (Figure 6.1 shows the more stringent one), we can now examine some basic statistics of the size of each to see how thresholds affect the network.

Table 6.3: Size of the Enrichment Maps.
Network Nodes Edges
Initial 1444 13256
Stringent 864 9370

Notice that after setting a more stringent threshold, we see that relatively few terms remain, along with their regulation status. It is interesting that there is a great number of downregulated terms, possibly indicating that CBD treatment has a greater effect in downregulation.

6.5 Annotating the Enrichment Map

To add more information to the visual, we will annotate the map using the AutoAnnotate app available in Cytoscape11. Specifically, the annotation will use the annotate-clusterBoosted method. As a baseline annotation effort, we first summarized the stringent network and then ran autoannotate on it, yielding the annotated network shown in Figure 6.2.

Annotated Enrichment Map of the GSEA results.

Figure 6.2: Annotated Enrichment Map of the GSEA results.

Before continuing further analysis we will remove the nodes that are not connected to any other nodes in the network. This will help us focus on the most relevant pathways in the network that have several of their components modified through treatment. To accomplish this, we will use the filter command in Cytoscape, and set this filter to only select the nodes that are within a distance of 3 of a node with a degree of 3 or more.

# Create a filter to limit the number of nodes.
#
# Args:
#   current_base: The base URL of the Cytoscape REST API
#   filter_json: The JSON specifying the filter
#   filter_name: The name of the filter
#   apply: A logical value indicating whether to apply the filter
#
# Returns:
#   The name of the filter
create_filter <- function(
    current_base, filter_json, filter_name, apply = TRUE
) {

    # Create the request body
    req_body <- jsonlite::toJSON(
        list(
            json = jsonlite::toJSON(filter_json, auto_unbox = TRUE),
            name = filter_name,
            apply = apply
        ),
        auto_unbox = TRUE
    )

    # Create the filter
    r <- httr::POST(
        url = paste0(
            current_base,
            "/commands",
            "/filter",
            "/create"
        ),
        body = jsonlite::fromJSON(req_body),
        encode = "json",
        handle = NULL
    )

    # Check the status of the response
    if (r$status != 200) {
        stop(sprintf("Failed to create filter: Code %s", r$status))
    }

    return(filter_name)
}

# Grab the SUIDs of the selected nodes
#
# Args:
#   current_base: The base URL of the Cytoscape REST API
#
# Returns:
#   The SUIDs of the selected nodes
get_node_suids <- function(current_base) {

    # Create the request body
    req_body <- jsonlite::toJSON(
        list(
            nodeList = "selected"
        ),
        auto_unbox = TRUE
    )

    # Create the filter
    r <- httr::POST(
        url = paste0(
            current_base,
            "/commands",
            "/node",
            "/list"
        ),
        body = jsonlite::fromJSON(req_body),
        encode = "json",
        handle = NULL
    )

    # Check the status of the response
    if (r$status != 200) {
        stop(sprintf("Failed to create filter: Code %s", r$status))
    }

    # Parse the response
    selected_nodes <- jsonlite::fromJSON(httr::content(r, "text"))

    return(selected_nodes)
}

# Grab the specified properties of the nodes
#
# Args:
#   current_base: The base URL of the Cytoscape REST API
#   properties: The properties to get
#   selected: Whether to query selected nodes or not
#
# Returns:
#   The properties of the selected nodes
get_node_properties <- function(current_base, properties, selected) {

    # Concatenate the properties vector into one string for query
    properties_str <- paste(properties, collapse = ", ")

    # Specify which group of nodes to query
    node_group <- ifelse(selected, "selected", "all")

    # Create the request body
    req_body <- jsonlite::toJSON(
        list(
            nodeList = node_group,
            propertyList = properties_str
        ),
        auto_unbox = TRUE
    )

    # Create the filter
    r <- httr::POST(
        url = paste0(
            current_base,
            "/commands",
            "/node",
            "/get%20properties"
        ),
        body = jsonlite::fromJSON(req_body),
        encode = "json",
        handle = NULL
    )

    # Check the status of the response
    if (r$status != 200) {
        stop(sprintf("Failed to create filter: Code %s", r$status))
    }

    # Parse the response
    selected_nodes <- jsonlite::fromJSON(httr::content(r, "text"))

    return(selected_nodes)
}

# Parse the properties of the nodes into a data frame
#
# Args:
#   properties_list: The list of properties
#
# Returns:
#   A data frame with the properties of the nodes
parse_node_properties <- function(properties_list) {
    # Initialize the data frame
    node_properties_df <- data.frame()

    # Iterate over the properties
    for (i in seq_len(nrow(properties_list))) {

        # Get the SUID of the current node
        node_suid <- properties_list[i, ]$SUID

        # Select the properties for the given node
        curr_properties <- properties_list[i, ]$visualProperties
        curr_properties <- curr_properties[[1]]

        # Create a new row for the data frame
        new_row <- t(data.frame(curr_properties))
        rownames(new_row) <- NULL
        colnames(new_row) <- new_row[1, ]
        new_row <- new_row[2, ]

        # Add the SUID to the new row
        new_row$SUID <- node_suid

        # Add the new row to the data frame
        node_properties_df <- rbind(node_properties_df, new_row)
    }

    return(node_properties_df)
}

# Define the filter JSON
filter_json <- list(
    id = "CompositeFilter",
    parameters = list(
        type = "ALL"
    ),
    transformers = list(
        list(
            id = "TopologyFilter",
            parameters = list(
                predicate = "GREATER_THAN_OR_EQUAL",
                distance = 3,
                threshold = 3,
                type = "ALL"
            ),
            transformers = list()
        )
    )
)

# Create the filter
filter_response <- create_filter(cyrest_url, filter_json, "limitnodes", TRUE)

By filtering, we can see which terms pass our filter and are likely deeply connected to other terms in the network. To fetch these terms (Table 6.4), we will look up the Label property of each term in our filter.

Table 6.4: Selected nodes in the Enrichment Map.
SUID NODE_LABEL Regulation
6357922 pathogen interaction human Upregulated
6357932 tnf related tweak Upregulated
6357876 regulation interferon production Upregulated
6357884 sting regulators rig Upregulated
6357886 nfkb zbp1 dai Upregulated
6357904 glycosyl process glycoside Downregulated
6357840 apc mediated degradation Downregulated
6357842 srp protein synthesis Downregulated
6357970 bard1 signaling events Downregulated
6357908 unwinding geometric change Downregulated
6357972 fatty metabolism oxidation Downregulated
6357844 coupled electron nadh Downregulated
6357912 fructose mannose degradation Downregulated
6357848 ner base excision Downregulated
6357850 catabolic diphosphate process Downregulated
6357852 chromatid regulation negative Downregulated
6357916 oxidase assembly respiratory Downregulated
6357982 nucleobase metabolic process Downregulated
6357854 ribonucleoside monophosphate xanthinuria Downregulated
6357918 pid atr stress Downregulated
6357856 presynaptic homologous due Downregulated
6357858 methyl 3 coa Downregulated
6357860 bisphosphate process acyl Downregulated
6357988 fanconi anemia pathway Downregulated
6357924 post tubulin folding Downregulated
6357866 postmitotic pore npc Downregulated
6357930 fgfr2 receptor activation Downregulated
6357934 gpi anchor glycolipid Downregulated
6357870 oxidation lipid fatty Downregulated
6357878 tricistronic rrna transcript Downregulated
6357942 cristae atp coupling Downregulated
6357944 2’ deoxyribonucleotide deoxyribose Downregulated
6357820 oncogenic hydroxyglutarate hydroxyglutaric Downregulated
6357822 centrosome kinetochores recruitment Downregulated
6357824 glycogenosis ic gluconeogenesis Downregulated
6357952 glycosphingolipid catabolism glycolipid Downregulated
6357826 attachment spindle chromosome Downregulated
6357954 mismatch repair mmr Downregulated
6357890 novo nucleotides de Downregulated
6357956 metabolic epileptic disorders Downregulated
6357830 regulation separation positive Downregulated
6357898 pyrimidine metabolic process Downregulated
6357962 ketone quinone metabolic Downregulated
6357834 replication fidelity dna Downregulated

6.6 Creating a Theme EM

With some initial exploration on which terms based the thresholding and the filtering, we will complete building the enrichment map by creating a theme EM. This will involve collapsing the network to create a more concise visual representation of the network.

Table 6.5: Collapsed nodes in the Enrichment Map.
NODE_LABEL NODE_FILL_COLOR SUID
regulation fat differentiation #B2182B 6357936
bmp receptor signaling #F0F0F0 6359570
hydroxyglutric succinic hyperinsulinism #2166AC 6357938
mrna deadenylation destabilization #B2182B 6357906
apc mediated degradation #F0F0F0 6358708
amino acid proteinogenic #F0F0F0 6359732
mhc class ii #2166AC 6357910
bisphosphate process acyl #F0F0F0 6359032
drug adme apap #F0F0F0 6359384
transforming growth stimulus #B2182B 6357914
oxidation lipid fatty #F0F0F0 6359290
pathogen interaction human #F0F0F0 6359196
chromatid separation positive #F0F0F0 6359643
fate specification determination #B2182B 6357920
mirna regulation ncrna #B2182B 6357888
protein mitochondrion localization #2166AC 6357892
epidermis epidermal epithelial #B2182B 6357926
ventricular septum morphogenesis #B2182B 6357894
mitotic cytokinesis cytoskeleton #2166AC 6357928
glycosphingolipid catabolism glycolipid #F0F0F0 6359464
regulation tnfr1 induced #B2182B 6357900
Theme Network of GSEA Results.

Figure 6.3: Theme Network of GSEA Results.

6.7 Insights

1. Create an enrichment map.

I created two enrichment maps with different thresholds because I wanted to see the effects that tuning thresholds would have, in comparison to the default values. The second enrichment map with more stringent thresholds is Figure 6.1, but is also shown below.

Stringent Enrichment Map of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD (p = 1.0, q = 0.01, similarity = 0.375)

Figure 6.4: Stringent Enrichment Map of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD (p = 1.0, q = 0.01, similarity = 0.375)

2. How many nodes and how many edges are in your enrichment map?

The initial network has 1444 nodes and 13256 edges. On the other hand, the stringent network has 864 nodes and 9370 edges.

3. What thresholds did you use to create your enrichment map?

To create the enrichment maps, I used the following thresholds:

Network Version P-Value Threshold Q-Value Threshold Similarity Threshold
Initial 1.0 0.05 0.375
Stringent 1.0 0.01 0.375

4. What parameters did you use to annotate the network?

For annotation, I used AutoAnnotate because of how simple it is to use and integrate into the Cytoscape workflow11. For the annotations, I ran both the autoannotate annotate-clusterBoosted and autoannotate annotate-sizeSorted commands to compare the output, but decided to proceed with just autoannotate annotate-clusterBoosted.

For the autoannotate annotate-clusterBoosted command, I used the default parameters:

  • adjacentWordBonus: 8
  • clusterAlgorithm: MCL
  • maxWords: 3
  • minWordOccurrence: 1

The result of autoannotate is shown in Figure 6.5.

An Annotated Enrichment Map (Auto) of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD

Figure 6.5: An Annotated Enrichment Map (Auto) of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD

5. Create a publication ready figure.

An issue with the figures above is that there is no clear theme due to the size of the network and all the distinct groupings. To address this, we can focus on a subset of the network that focuses on one of the major themes. With a more focused network, we can also consider additions like a legend which would be necessary for something that’s publication-level. The figure below is in .svg format to allow for easy editing and scaling, which is often used for publication-quality networks.

Annotated (with legend) Theme Network of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD

Figure 6.6: Annotated (with legend) Theme Network of GSEA Results Comparing SARS-CoV-2-infected and Healthy ACE549 cells treated with CBD

Note that this figure was generated via automating Cytoscape through CyREST. The apc mediated degredation node was highlighted by using RCy3::setNodePropertyBypass() and RCy3::removeNodeCustomGraphics(), as it is the “theme of interest”. However, this was not clearly mentioned in the documentation/resources for CyREST/Cytoscape, hence this automation was the result of trial and error (i.e., the property being NODE_FILL_COLOR instead of Fill Color in certain Cytoscape interfaces).

6. Collapse your network to create a theme network.

See Figure 6.6.

7. What are the major themes present in this analysis? Do they fit with the model?

One of the major themes in Figure 6.6 is the apc mediated degredation pathway. In the condensed figure above, it has the most connections to the other nodes.

8. Are there any novel pathways or themes in this analysis?

None that are immediately apparent.

6.8 Interpretation

1. Do the enrichment results support conclusions or mechanism discussed in the original paper?

One of the key findings of this analysis and inspecting the final EM is that the apc mediated degredation pathway forms the largest cluster of related terms. The original publication does not outwardly discuss this pathway or the role of antigen-presenting cells (APCs), however, it is generally known that APCs play a crucial role in the immune response to infection- be it pathogenic or tumor-related12.

Furthermore, the publication discusses the role of the ER stress response, and how CBD has effects on genes that are enduced by this response4. If cells undergo too much oxidative stress, they can undergo apoptosis. I believe that antigen-presenting cells may factor in this apoptosis/degradation, especially if CBD cannot exert its “early” prevention of SARS-CoV-2 on the cells, likely leading to that response as a result.

2. Can you find other studies to support some of the results that you see? How does this evidence support your result?

One article discusses the mechanisms of the mRNA vaccines from different pharma companies during the COVID-19 pandemic, and it explains how the vaccine acts in relation to other phenomena. A key portion that connects to the idea of apc mediated degredation is their mention of how the mRNA vaccines are taken up by APCs, leading to an immune response12. I think this mechanism could support my previous interpretation in that the APC-induced response will lead to degredation of infected cells (possibly even those with the vaccine, despite it being attenuated).

7 Post-Analysis

8 References

1. Atalay S, Jarocka-Karpowicz I, Skrzydlewska E. Antioxidative and anti-inflammatory properties of cannabidiol. Antioxidants. 2019;9(1):21. http://dx.doi.org/10.3390/antiox9010021. doi:10.3390/antiox9010021
2. Cassano T, Villani R, Pace L, Carbone A, Bukke VN, Orkisz S, Avolio C, Serviddio G. From cannabis sativa to cannabidiol: Promising therapeutic candidate for the treatment of neurodegenerative diseases. Frontiers in Pharmacology. 2020;11. http://dx.doi.org/10.3389/fphar.2020.00124. doi:10.3389/fphar.2020.00124
3. Bonaccorso S, Ricciardi A, Zangani C, Chiappini S, Schifano F. Cannabidiol (CBD) use in psychiatric disorders: A systematic review. NeuroToxicology. 2019;74:282–298. http://dx.doi.org/10.1016/j.neuro.2019.08.002. doi:10.1016/j.neuro.2019.08.002
4. Nguyen LC, Yang D, Nicolaescu V, Best TJ, Gula H, Saxena D, Gabbard JD, Chen S-N, Ohtsuki T, Friesen JB, et al. Cannabidiol inhibits SARS-CoV-2 replication through induction of the host ER stress and innate immune responses. Science Advances. 2022;8(8). http://dx.doi.org/10.1126/sciadv.abi6110. doi:10.1126/sciadv.abi6110
5. Wickham H. ggplot2: Elegant graphics for data analysis. Springer-Verlag New York; 2016. https://ggplot2.tidyverse.org
6. Robinson MD, McCarthy DJ, Smyth GK. edgeR: A bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi:10.1093/bioinformatics/btp616
7. Kolberg L, Raudvere U, Kuzmin I, Vilo J, Peterson H. gprofiler2– an r package for gene list functional enrichment analysis and namespace conversion toolset g:profiler. F1000Research. 2020;9 (ELIXIR)(709).
8. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences. 2005;102(43):15545–15550. http://dx.doi.org/10.1073/pnas.0506580102. doi:10.1073/pnas.0506580102
9. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research. 2003;13(11):2498–2504. http://dx.doi.org/10.1101/gr.1239303. doi:10.1101/gr.1239303
10. Otasek D, Morris JH, Bouças J, Pico AR, Demchak B. Cytoscape automation: Empowering workflow-based network analysis. Genome Biology. 2019;20(1). http://dx.doi.org/10.1186/s13059-019-1758-4. doi:10.1186/s13059-019-1758-4
11. Kucera M, Isserlin R, Arkhangorodsky A, Bader GD. AutoAnnotate: A cytoscape app for summarizing networks with semantic annotations. F1000Research. 2016;5:1717. http://dx.doi.org/10.12688/f1000research.9090.1. doi:10.12688/f1000research.9090.1
12. Rastogi I, Jeon D, Moseman JE, Muralidhar A, Potluri HK, McNeel DG. Role of b cells as antigen presenting cells. Frontiers in Immunology. 2022;13. http://dx.doi.org/10.3389/fimmu.2022.954936. doi:10.3389/fimmu.2022.954936
13. Rijkers GT, Weterings N, Obregon-Henao A, Lepolder M, Dutt TS, Overveld FJ van, Henao-Tamayo M. Antigen presentation of mRNA-based and virus-vectored SARS-CoV-2 vaccines. Vaccines. 2021;9(8):848. http://dx.doi.org/10.3390/vaccines9080848. doi:10.3390/vaccines9080848
14. Xie Y. Bookdown: Authoring books and technical documents with r markdown. 2023. https://github.com/rstudio/bookdown
15. Morgan M, Ramos M. BiocManager: Access the bioconductor project package repository. 2023. https://CRAN.R-project.org/package=BiocManager
16. Xie Y. Knitr: A general-purpose package for dynamic report generation in r. 2023. https://yihui.org/knitr/
17. Gustavsen, A. J, Pai, Shraddha, Isserlin, Ruth, Demchak, Barry, Pico, R. A. RCy3: Network biology using cytoscape from within r. F1000Research. 2019. doi:10.12688/f1000research.20887.3